LONGL-Net: temporal correlation structure guided deep learning model to predict longitudinal age-related macular degeneration severity

Abstract Age-related macular degeneration (AMD) is the principal cause of blindness in developed countries, and its prevalence will increase to 288 million people in 2040. Therefore, automated grading and prediction methods can be highly beneficial for recognizing susceptible subjects to late-AMD and enabling clinicians to start preventive actions for them. Clinically, AMD severity is quantified by Color Fundus Photographs (CFP) of the retina, and many machine-learning-based methods are proposed for grading AMD severity. However, few models were developed to predict the longitudinal progression status, i.e. predicting future late-AMD risk based on the current CFP, which is more clinically interesting. In this paper, we propose a new deep-learning-based classification model (LONGL-Net) that can simultaneously grade the current CFP and predict the longitudinal outcome, i.e. whether the subject will be in late-AMD in the future time-point. We design a new temporal-correlation-structure-guided Generative Adversarial Network model that learns the interrelations of temporal changes in CFPs in consecutive time-points and provides interpretability for the classifier's decisions by forecasting AMD symptoms in the future CFPs. We used about 30,000 CFP images from 4,628 participants in the Age-Related Eye Disease Study. Our classifier showed average 0.905 (95% CI: 0.886–0.922) AUC and 0.762 (95% CI: 0.733–0.792) accuracy on the 3-class classification problem of simultaneously grading current time-point's AMD condition and predicting late AMD progression of subjects in the future time-point. We further validated our model on the UK Biobank dataset, where our model showed average 0.905 accuracy and 0.797 sensitivity in grading 300 CFP images.


S1. Dataset Statistics after Down-sampling
summarizes the number of pairs with 2, 3, and 4 years gap between the first time point and the future one used in training/validation/test sets in our experiments for the classifier and GAN model. In each experiment, the data partitions are disjoint in subject level, i.e., the pairs for each participant are only present in one of data partitions. Table S1: Statistics of the dataset used for training the classifier after down-sampling. The numbers in the parenthesis in columns 4-6 indicate the histogram of the partition in each row. As the maximum follow-up visit length is 13 years, the largest possible value in the visit pairs is 26.

S2. Pre-processed Data Examples
The following figures illustrate some original images from batches 2010 and 2014 in the dataset and their corresponding cropped and resized version used in our experiments.

S3. GAN Model Details
Generator U-Net [1] architecture is used for the generator because of its skip connections that enable it to pass low level structural information of its input to output easier than encoder-decoder architectures in which all the information should get passed through the bottleneck layer.
The number of channels in the generator's architecture (U-Net 256) [1,2]

Discriminator
It has been observed [2,3] that using patch discriminator which penalizes its input's structure in the patch level rather than full image can still result in sharp realistic-looking (preserving high frequency information) generated images and also benefit from more computational efficiency because the size of patches can be smaller than the image. As mentioned in the paper, we used 70 * 70 'Patch' discriminator network. The number of channels in the discriminator's architecture is as follows: → 128 → 256 → 512 → 1 ( ℎ ℎ)

Network Initialization
We followed Isola et al. [2] for network initialization, i.e., the generator and discriminator are initialized from a Gaussian distribution with zero mean and standard deviation equal to 0.02.

S6. Images from UK Biobank used in our Experiments
Here, we list the name of images that we selected from UKBiobank to perform the validation experiments of our model on an independent dataset.

S8. Structure of Adding Age Information to the Inputs of the Classifier
To explore whether explicitly providing the age information to the model can improve the performance we concatenate age value to the representations of the outputs of the global average pooling layer in ResNet-18 [6] architecture. The procedure is shown in Figure S6. Figure S14: The classifier architecture when using both fundus image and age information as its inputs.

S9. Classification Model's Results on Grading the Fundus Images
We compare our model's performance on the task of grading its input image with the late AMD grading module in DeepSeeNet [7].